cabinet 1
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.05)
- Asia > Singapore (0.04)
- North America > Canada (0.04)
- (11 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.67)
- Education > Educational Setting (0.67)
- Education > Curriculum > Subject-Specific Education (0.46)
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Europe > Sweden > Stockholm > Stockholm (0.04)
- (3 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.67)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.05)
- Asia > Singapore (0.04)
- North America > Canada (0.04)
- (11 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.67)
- Education > Educational Setting (0.67)
- Education > Curriculum > Subject-Specific Education (0.46)
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Europe > Sweden > Stockholm > Stockholm (0.04)
- (3 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.67)
A Goal Without a Plan Is Just a Wish: Efficient and Effective Global Planner Training for Long-Horizon Agent Tasks
Si, Shuzheng, Zhao, Haozhe, Luo, Kangyang, Chen, Gang, Qi, Fanchao, Zhang, Minjia, Chang, Baobao, Sun, Maosong
Agents based on large language models (LLMs) struggle with brainless trial-and-error and generating hallucinatory actions due to a lack of global planning in long-horizon tasks. In this paper, we introduce a plan-and-execute framework and propose EAGLET, an efficient and effective planner training method to enhance the executor agent's planning abilities without human effort. Specifically, we train a plug-and-play global planner through a two-step process: we first synthesize high-quality plans from an advanced LLM using our proposed homologous consensus filtering strategy, and apply fine-tuning as a cold start. Moreover, we further improve the planner with a rule-based reinforcement learning stage using a novel executor capability gain reward, ensuring it can handle task instructions of varying difficulty. Experiments on three long-horizon agent tasks show that executor agents equipped with our planner outperform existing methods, achieving new state-of-the-art performance. Meanwhile, EAGLET reduces training costs by 8x compared to RL-based baselines, and it does not require manual effort or extra training data, offering an efficient and effective solution.
ReflAct: World-Grounded Decision Making in LLM Agents via Goal-State Reflection
Kim, Jeonghye, Rhee, Sojeong, Kim, Minbeom, Kim, Dohyung, Lee, Sangmook, Sung, Youngchul, Jung, Kyomin
Recent advances in LLM agents have largely built on reasoning backbones like ReAct, which interleave thought and action in complex environments. However, ReAct often produces ungrounded or incoherent reasoning steps, leading to misalignment between the agent's actual state and goal. Our analysis finds that this stems from ReAct's inability to maintain consistent internal beliefs and goal alignment, causing compounding errors and hallucinations. To address this, we introduce ReflAct, a novel backbone that shifts reasoning from merely planning next actions to continuously reflecting on the agent's state relative to its goal. By explicitly grounding decisions in states and enforcing ongoing goal alignment, ReflAct dramatically improves strategic reliability. This design delivers substantial empirical gains: ReflAct surpasses ReAct by 27.7% on average, achieving a 93.3% success rate in ALFWorld. Notably, ReflAct even outperforms ReAct with added enhancement modules (e.g., Reflexion, WKM), showing that strengthening the core reasoning backbone is key to reliable agent performance.
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- Asia > South Korea > Seoul > Seoul (0.04)
- Law (0.94)
- Materials > Metals & Mining (0.46)
- Leisure & Entertainment > Games (0.45)
TCPO: Thought-Centric Preference Optimization for Effective Embodied Decision-making
Jiao, Kechen, Fang, Zhirui, Liu, Jiahao, Li, Bei, Wang, Qifan, Liu, Xinyu, Ruan, Junhao, Qiao, Zhongjian, Zhu, Yifan, Xu, Yaxin, Wang, Jingang, Li, Xiu
Using effective generalization capabilities of vision language models (VLMs) in context-specific dynamic tasks for embodied artificial intelligence remains a significant challenge. Although supervised fine-tuned models can better align with the real physical world, they still exhibit sluggish responses and hallucination issues in dynamically changing environments, necessitating further alignment. Existing post-SFT methods, reliant on reinforcement learning and chain-of-thought (CoT) approaches, are constrained by sparse rewards and action-only optimization, resulting in low sample efficiency, poor consistency, and model degradation. To address these issues, this paper proposes Thought-Centric Preference Optimization (TCPO) for effective embodied decision-making. Specifically, TCPO introduces a stepwise preference-based optimization approach, transforming sparse reward signals into richer step sample pairs. It emphasizes the alignment of the model's intermediate reasoning process, mitigating the problem of model degradation. Moreover, by incorporating Action Policy Consistency Constraint (APC), it further imposes consistency constraints on the model output. Experiments in the ALFWorld environment demonstrate an average success rate of 26.67%, achieving a 6% improvement over RL4VLM and validating the effectiveness of our approach in mitigating model degradation after fine-tuning. These results highlight the potential of integrating preference-based learning techniques with CoT processes to enhance the decision-making capabilities of vision-language models in embodied agents.
- North America > United States > Washington > King County > Seattle (0.04)
- Asia > China > Hubei Province > Wuhan (0.04)
- Asia > China > Beijing > Beijing (0.04)
GFlowVLM: Enhancing Multi-step Reasoning in Vision-Language Models with Generative Flow Networks
Kang, Haoqiang, Sachdeva, Enna, Gupta, Piyush, Bae, Sangjae, Lee, Kwonjoon
Vision-Language Models (VLMs) have recently shown promising advancements in sequential decision-making tasks through task-specific fine-tuning. However, common fine-tuning methods, such as Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) techniques like Proximal Policy Optimization (PPO), present notable limitations: SFT assumes Independent and Identically Distributed (IID) data, while PPO focuses on maximizing cumulative rewards. These limitations often restrict solution diversity and hinder generalization in multi-step reasoning tasks. To address these challenges, we introduce a novel framework, GFlowVLM, a framework that fine-tune VLMs using Generative Flow Networks (GFlowNets) to promote generation of diverse solutions for complex reasoning tasks. GFlowVLM models the environment as a non-Markovian decision process, allowing it to capture long-term dependencies essential for real-world applications. It takes observations and task descriptions as inputs to prompt chain-of-thought (CoT) reasoning which subsequently guides action selection. We use task based rewards to fine-tune VLM with GFlowNets. This approach enables VLMs to outperform prior fine-tuning methods, including SFT and RL. Empirical results demonstrate the effectiveness of GFlowVLM on complex tasks such as card games (NumberLine, BlackJack) and embodied planning tasks (ALFWorld), showing enhanced training efficiency, solution diversity, and stronger generalization capabilities across both in-distribution and out-of-distribution scenarios.
- Asia > China > Ningxia Hui Autonomous Region > Yinchuan (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)